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Abstract —Massive MIMO is a variant of multiuser MIMO 
where the uumher of hase-statlou auteuuas M is very large 
(typically « 100), and generally much larger thau the number of 
spatially multiplexed data streams (typically « 10). The benefits 
of such approach have beeu iuteuslvely iuvestigated iu the past 
few years, aud all-digital experimeutal implemeutatious have also 
beeu demoustrated. Uufortuuately, the front-eud AID conversiou 
uecessary to drive huudreds of auteuuas, with a sigual baudwldth 
of the order of 10 to 100 MHz, requires very large sampliug bit- 
rate and power consumption. 

In order to reduce such implemeutatiou requiremeuts. Hybrid 
Digital-Aualog architectures have beeu proposed. Iu particular, 
our work iu this paper is motivated by one of such schemes 
named Joint Spatial Division and Multiplexing (JSDM), where 
the downlink precoder (resp., uplink linear receiver) is split into 
the product of a basebaud linear projection (digital) and an RF 
reconfigurable beamforming network (analog), such that only a 
reduced number m <C M of A/D couverters aud RF modula- 
tlou/demodulatiou chaius is ueeded. Iu JSDM, users are grouped 
according to similarity of their channel dominant subspaces, aud 
these groups are separated by the aualog beamforming stage, 
where multiplexing gain in each group is achieved usiug the 
digital precoder. Therefore, it is appareut that extractiug the 
chauuel subspace informatiou of the M-dim chauuel vectors 
from suapshots of m-dlm projections, with m M, plays a 
fuudameutal role iu JSDM implemeutatiou. 

Iu this paper, we develop uovel efficieut algorithms that require 
sampling only m = 0{2'/M) specific array elements according 
to a coprime sampling scheme, and for a given p ^ M, return 
a p-dim beamformer that has a performance comparable with 
the best p-dim beamformer that can be designed from the full 
kuowledge of the exact chauuel covariauce matrix. We assess 
the performauce of our proposed estimators both aualytically 
aud empirically via uumerical slmulatlous. We also demoustrate 
by simulatiou that the proposed subspace estimatiou methods 
provide uear-ideal performauce for a massive MIMO JSDM 
system, by comparing with the case where the user chauuel 
covariauces are perfectly kuowu. 


I. Introduction 

C ONSIDER a multiuser MIMO channel formed by a base- 
station (BS) with M antennas and K single-antenna 
mobile users in a cellular network. Following the current 
massive MIMO approach IM, uplink (UL) and downlink 
(DL) are organized in Time Division Duplexing (TDD), and 
the BS transmit/receive hardware is designed or calibrated in 
order to preserve UL-DL reciprocity ||5j |6l such that the BS 
can estimate the channel vectors of the users from UL training 
signals sent by the users on orthogonal dimensions. Since there 
is no multiuser interference on the UL training phase, in this 
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paper we shall focus on the basic channel estimation problem 
for a single user. 

In massive MIMO systems, the number of antennas M is 
typically much larger than the number of users K scheduled 
to communicate over a given transmission time slot (i.e., 
the number of spatially multiplexed data streams). Letting 
D denote the duration of a time slot (expressed in channel 
uses), tD channel uses for some r € (0,1), are dedicated to 
training and the remaining (1 — t)D channel uses are devoted 
to data transmission, where it is assumed that D is not larger 
than the channel coherence block length, i.e., the number of 
channel uses over which the channel is nearly constant IT]. 
It turns out that for isotropically distributed channel vectors 
with min{M, iC} > D/2, it is optimal to devote a fraction 
T = 1/2 of the slot to channel estimation while serving only 
D/2 out of K users in the remaining half mm 

In many relevant scenarios, the channel vectors are highly 
correlated since the propagation occurs through a small set of 
Angle of Arrivals (AoAs). This correlation can be exploited 
to improve the system multiplexing gain and decrease the 
training overhead. A particularly effective scheme is the Joint 
Space Division and Multiplexing (JSDM) approach proposed 
and analyzed in GHni- JSDM starts from the consideration 
that for a user with a channel vector h G the signal 
covariance matrix S = E[hh'^] is typically low-ranl0 More¬ 
over, according to the well-known and widely accepted Wide- 
Sense Stationary Uncorrelated Scattering (WSSUS) channel 
model, S is invariant over time and frequency. In particular, 
while the small-scale fading has a coherence time between 
0.1s and 10 ms for motion speed between 1 m/s to 10 m/s 
at the carrier frequency of 3 GHz, the time over which the 
channel vector can be considered WSS is of the order of tens 
of seconds, i.e., from 2 to 4 orders of magnitude larger. Hence, 
estimating the signal subspace of a user is a much easier task 
than estimating the instantaneous channel vector h on each 
coherence time slot. This is especially important in mm-wave 
channels (e.g., carrier frequency of the order of 30 GHz) since, 
due to the higher carrier frequency, the Doppler bandwidth 
of these channels is large and therefore D is small, i.e., the 
multiplexing gain of D /2 achieved by estimating the channels 
by TDD on each given slot as in fl] is significantly impaired. 

When the subspace information for the users can be accu¬ 
rately estimated over a long sequence of time slots, JSDM 

* When K > D/2, then groups of D/2 users are scheduled over different 
time slots such that all users achieve a positive throughput (i.e., rate averaged 
over a long sequence of scheduling slots). 

^This is especially true in the case of a tower-mounted BS and/or in the case 
of mm-wave channels, as experimentally confirmed by channel measurements 
(see and references therein). 
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partitions the users into G > 1 groups such that users in each 
group have approximately the same dominant channel sub¬ 
space GHD. The overall multiplexing gain is obtained in two 
stages, as the concatenation of two linear projections. Namely, 
groups are separated by zero-forcing beamforming that uses 
only the group subspace information. Then, additional mul¬ 
tiuser multiplexing gain can be obtained by conventional linear 
precoding applied independently in each group. In this way, 
the system multiplexing gain can be boosted by G such that 
a decrease in D can be compensated by a larger G 112]. 

Furthermore, JSDM lends itself naturally to a Hybrid Digital 
Analog (HDA) implementation, where the group-separating 
beamformer can be implemented in the analog (RF) domain, 
and the multiuser precoding inside each group is implemented 
in the digital (baseband) domain. The analog beamform¬ 
ing projection reduces the dimensionality from M to some 
intermediate dimension m <C M. Then, the resulting m 
inputs (UL) are converted into digital baseband signals, and 
are further processed in the digital domain. This has the 
additional non-trivial advantage that only m M RF chains 
(A/D converters and modulators) are needed, thus reducing 
significantly the massive MIMO BS receiver/transmitter front- 
end complexity and power consumption. 

From what said, it is apparent that a central task at the BS 
side consists in estimating, for each user, a subspace containing 
a signihcant amount of its received signal power. Since in an 
HDA implementation we do not have direct access to all the 
M antennas, but only to m <C M analog output observations, 
we need to estimate this subspace from snapshots of a low-dim 
projection of the signal. 

A. Contribution 

In this paper, we aim to design such a subspace estimator 
for a BS with a large uniform linear array (ULA) with M \ 
antennas. The geometry of the array is shown in Fig. [T] with 
array elements having uniform spacing d. 


Scattering Channel 



Fig. 1; Array configuration in a multi-antenna receiver in the 
presence of a scattering channel with discrete angle of arrivals. 

We assume that the array serves the users in the angular 
range [-^max, 6»max] for some e^ax £ (0,7r/2), and we let 
d = „ . A —r, where A is the wave-length. In general, 
we assume that we can observe only low-dim sketches of 
the received signal via m M linear projections. In the 


case where the projection matrix contains a single non-zero 
element equal to 1 in each row, we recover the case of array 
subsampling as a special case. In particular, we shall consider a 
coprime sampling scheme requiring m = 0{2\/M). Coprime 
subsampling was first developed by Vaidyanathan and Pal in 
IH M, where they showed that for a given spatial span 
for the array, one obtains approximately the same resolution 
as a uniform linear array by nonuniformly sampling only a 
few array elements at coprime locations. We propose sev¬ 
eral algorithms for estimating the signal subspace and cast 
them as convex optimization problems that can be solved 
efficiently. We also compare via simulation the performance 
of our algorithms with other state-of-the-art algorithms in 
the literature. The relevance of the proposed approach for 
JSDM is demonstrated via a representative example, where the 
dominant subspace of users with different channel correlations 
are estimated and grouped according to the Grassmanian 
quantization scheme introduced in ||8l. Then, JSDM is applied 
to the estimated user groups. We compare the achieved sum- 
rate of our scheme with the ideal case, where the users’ 
channel covariances are perfectly known, as in IH, and we 
find that the performance penalty incurred by our proposed 
method is negligible, even for very short training lengths. 
Notation. Throughout the paper, the output of an optimization 
algorithm argmin^/(a;) is denoted by x*. We use T and 
T_(. for the space of a\\ M x M Hermitian Toeplitz and 
Hermitian semi-definite Toeplitz matrices. We always use I 
for the identity matrix, where the dimension may be explicitly 
indicated for the sake of clarity (e.g., Ifc denotes the k x k 
identity matrix). We denote a k x k diagonal matrix with 
k diagonal elements ai,... ,ak with diag(ai ,... ,ak)- We 
define H(M,p) = {Vmxp £ : U'^U = Ip} as the set 

of tall unitary matrices of dimension M x p. For matrices and 
vectors of appropriate dimensions, we define the inner product 
by (K, L) = Tr(KL*^), where Tr denotes the trace operator, 
and define the induced norm by ||K|j = -sj (K, K), also known 
as Frobenius norm for matrices. For an integer /c G Z, we use 
the shorthand notation [fc] for the set of non-negative integers 
{0,1,..., fc — 1}, where the set is empty if /c < 0. 

H. Related Work 

Several works in the literature are related to the problem 
addressed in this paper, which can be summarized in the 
following four categories; Subspace tracking. Low-rank matrix 
recovery, Direction-of-arrival (DoA) estimation, and Multiple 
Measurement Vectors (MMV) problem in compressed sensing 
(CS). For the sake of completeness, in this section we briefly 
review these approaches. 

While in the rest of this paper we shall treat a general 
scattering model, for simplicity of exposition it is convenient 
to focus here on a simple model in which the transmission 
between a user and the BS occurs through p scatterers (see 
Fig[3- One snapshot of the received signal is given by 

p 

y + (1) 

t=i 

where x is the transmitted (training) symbol, wi ~ CA/'(0, of) 
is the channel gain of the f-th multipath component, n ^ 
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CN{Q,(j‘^1m) is the additive white Gaussian noise of the 
receiver antenna, and where a(0) G is the array response 
at AoA 0, whose fc-th component is given by 

[a(0)]fc = . (2) 

According to the WSSUS model, the channel gains for dif¬ 
ferent paths, i.e., are uncorrelated. Without loss of 

generality, we suppose a; = 1 in all training snapshots. Letting 
A = [a(0i),..., a(0p)], we have 

j{t) = Aw(f) + n(<), t G [T], (3) 


we have only a collection of T snapshots X = BY as defined 
before. Let 

^ 1 ^ ^ ^ 

C, = BCpB» (5) 

t=l 

be the sample covariance of the full and projected signal. A 
natural extension of the matrix completion by nuclear-norm 
minimization to our case is readily give by; 

min Tr(M) subject to M G T+, IlCa; — BMB*^|| < e, (6) 

M 

where e is an estimate of the £ 2 - 110 ™ of the error. 


where w(f) = ... ,Wp{t)y for different t G [T] 

are statistically independent. Also, we assume that the AoAs 
invariant over the whole training period of length 
T slots. We gather the received signal’s and subsampled sig¬ 
nal’s snapshots into M xT mati'ix Y = [y(0),..., y(T — 1)], 
and m X T matrix X = [x(0),..., x(T — 1)], where x(f) = 
By(f), and X = BY for the mx M projection matrix B. 
From ([^, the covariance of y(f) is given by 

p 

Cy = ASA^ -G a^M = E (4) 

i=i 

where S = diag(cr^,..., dp) is the covariance matrix of w(f). 


A. Subspace Tracking from Incomplete Observations 

Let Cy = UAU'^ be the singular value decomposition 
(SVD) of Cy, where A = diag(Ai,..., Am) denotes the 
diagonal matrix of singular values (sorted in non-increasing 
order). Denoting by Up the M x p matrix consisting of the 
hrst p columns of U, we have that the columns of Up form 
an orthonormal basis for the signal subspace. The goal of 
subspace tracking from incomplete observations consists in 
estimating this subspace from the noisy low-dim sketches 
x(f) = By(f), revealed to the estimator sequentially for 
f G [T]. The noiseless version of this problem was studied by 
Chi et. al. in ua, proposing the PETRELS algorithm. Another 
algorithm named GROUSE was proposed by Balzano et. al. 
in HU- The main focus of both algorithms is to optimize 
the computational complexity rather than the data size, and 
therefore they are mainly suited to the case where both M 
and T are high. 


B. Low-rank Matrix Recovery 

Eor p M and for a high signal-to-noise ratio (SNR), 
the covariance matrix Cy in Q is nearly low-rank. Recovery 
of low-rank matrices from a collection of a few possibly 
noisy samples is of great importance in signal processing and 
machine learning. Recently, it has been shown that this can be 
achieved via nuclear-norm minimization, which is a convex 
problem and can be efficiently solved ini. Eor a symmetric 
matrix M, the nuclear norm ||M||» is given by the sum of 
the absolute values of the eigen-values of M, and reduces to 
Tr(M) when M is positive semi-dehnite (PSD). In our case. 


C. Direction-of-arrival Estimation and Super-resolution 

Prom ([^, it is seen that the received signal y{t) is a noisy 
superposition of p independent Gaussian sources arriving from 
p different angles. This is the same model studied for direction- 
of-arrival (DoA) estimation. There are two main categories 
of algorithms for DoA estimation; classical super-resolution 
(SR) algorithms such as ESPRIT US) and MUSIC lUl, and 
more recent compressed sensing based algorithms that use the 
angular sparsity of the signal over a discrete grid of AoAs. 
Although grid-based approaches suffer from the mismatch of 
off-grid sources IMl . they have been vastly studied ll2Tlj29) . 
Recently, Candes and Fernandez-Granda oniisD developed 
a SR technique based on total-variation (TV) minimization, 
which inherits the convex optimization computational advan¬ 
tage of compressed sensing. This approach was extended by 
Tan et. al. in 02) to DoA estimation with coprime arrays, when 
the AoAs are sufficiently separated. In a wireless environment, 
the AoAs may be clustered. This implies that the separation 
requirement for the SR setup may not be met. For example, 
often a continuous AoA density function has been observed 
in measurements (e.g., see ll33ll L and is considered in channel 
models (e.g., see El). This represents an obstacle for a 
straightforward application of SR methods (both classical 
El US) and modern EQHUI). Since in this paper we aim 
at estimating the subspace of the signal rather than DoAs, in 
Section IV-D we extend the SR approach, and develop a new 
algorithm for estimating the signal subspace. 


D. Multiple Measurement Vectors (MMV) 

It is seen from Q that, neglecting the measurement noise 
n(£), the signal y{t) has typically a sparse representation 
over the continuous dictionary {Ba(0),0 G [—^maxi^max]}^ 
i.e., only p atoms of the dictionary {Ba(0i)}£^j^ are needed 
to represent the signal. After a suitable discretization of the 
dictionary (e.g., using a discrete grid of AoAs), the problem 
of estimating Y from the collection of snapshots X = BY, 
knowing that each column x(f) of X has the same sparsity 
pattern (i.e., it is a linear combination of the same dictionary 
elements {Ba(0i)}£^j^) is the classical Multiple Measurement 
Vectors (MMV) problem in compressed sensing, which has 
been widely studied in the literature (see e.g., EHESSOl and 
refs, therein). Since (as in our case) the underlying dictionary 
may be continuous, more recently off-grid MMV techniques 
have also been developed ll4TH43l . 
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We will compare the performance of our algorithms with 
a grid-based MMV approach as in where the channel 
coefficients W = [w(0),..., w(T — 1)] are estimated by 

W* = argmin ||M|| 2 ,i subject to ||X — DM|| < e, (7) 

where D = [Ba(0i),..., Ba(0G)] is a quantized dictionary 
over a grid of AoAs of size G, and where the so-called 
f 2 .i-iionti of the matrix M = [mi,..., is defined as 

I|M|| 2 ,i = Lfcilltnill’ where m* e C^, i = 
denote the rows of M. The signal subspace is eventually given 
by span{a(0i) : i G -4}, where A contains the index of the 
“active” columns of D, i.e., those indexed by the support set 
of W*. 

We will also compare our algorithms with a grid-less 
approach inspired by EDia, based on applying atomic-norm 
denoising to the received signal Y. This can be cast as the 
following semi-definite program (SDP) 


(T*,W*, Z*) = argmin Tr(T) + Tr(W) 

TeT+,WGC^xT zgC^xT 


subject to 


T Z 

Z^ w 


AO,|jX-BZ|| <e', (8) 


where T* gives an estimate of the signal covariance matrix 
and where e' is an estimate of ^ 2 -norm of the noise in the 
projected data. 

In both 0 and 0, the computational complexity scales 
with the number of observation snapshots TN This poses 
a problem when the training time T is large. Although an 
ADMM formulation as in B4l is proposed in Il42ll to reduce 
the computational complexity, the parameters of ADMM need 
to be selected very carefully to guarantee convergence. We 
shall see that our algorithms perform equally or better than 
0 and (|8]l and have significantly less complexity for large T, 
since their complexity does not scale with T. 


III. Channel Model and Problem Statement 

More general than in Q, the channel vector may be formed 
by the superposition of a continuum of array responses. In 
order to include this case, we define the AoA scattering func¬ 
tion j{u), which describes the received power density along 
the direction identified by u G [—1,1], where u = ^ 

for 9 G [—0max5 ^max]- We denote the array vector in the u 
domain by a(u), where [a(u)]fe = Then, the channel 

model is given by 


y(t) = J ^ Vj(u) 


u)a.{u)z{u, t)du -\- n(f), 


(9) 


where z(u, t) is a white circularly symmetric Gaussian process 

z{uA)z{u' ,t'Y = — 


with a covariance function E 


u')5t,t'- The covariance matrix of y{f) is also given by 
Cy = j 7(u)a(u)a(M)^(iu-f ct^Im = S-f (10) 


^This scaling depends highly on the specific SDP solver and the structure 
of the matrix, but it is typically at least of the order 0{T^). 


where S = 8(7) := j(u)a(u)a(u)^du denotes the 
covariance matrix of the signal part, and where ct^Im is the 
covariance matrix of the white additive noise. We define the 
received SNR by 

_ Tr(S(7)) _ /.!i7(M)Ha(M)lP^» _ 7 (u)du 
Tr(cr'^lM} Ma'^ tr^ ’ 

where 'y{u)du is the whole received signal power in a 
given array element. For the ULA, S is a Toeplitz matrix 
with [S]ij = [f]i-j, where f is an M-dim vector with 
[f]j, = du for k G [M], and corresponds to 

the A:-th Fourier coefficient of the density 7. 

We define the best p-dim beamforming matrix for the 
covariance matrix S as Vp = arg maxugH(M p) (S,UU'^). 
Letting S = VAV'^ = be the SVD of S, the 

matrix Vp is an M x p tall unitary matrix formed by the first p 
columns of V. The signal power captured by this beamformer 
is given by (S, VpV^) = J2i=i 

In this paper, we are concerned with the estimation of Vp, 
for some appropriately chosen p, from the noisy snapshots of 
the projected channel (sketches) X = BY obtained during 
a training period of length T, as defined at the beginning of 
Section |II] In order to measure the “goodness” of estimators, 
we propose the following performance metric which is relevant 
to the underlying communication problem of JSDM group 
separation beamforming. First, we define the efficiency of the 
best p-dim beamformer by 

(S,VpVpH) Tr(VHSVp) 

Tr(S) Tr(S) ' ^ ’ 


If Pp « 1 for some p <C M, then a significant amount of 
signal’s power is captured by a low-dim beamformer. Let now 
Vp = Vp(X) be an estimator of Vp from the sketches X. 
We define the relative efficiency of Vp as 


P (S,VpVpH) 
" (S,VpVpH) 


= 1 - 


S,VpVH)-(S,VpVH) 

(S,VpVH) 


( 12 ) 


Hence, the efficiency of Vp is given by rjp = FpPp, and 
1 — Fp represents the fraction of signal power lost due to the 
mismatch between the optimal beamformer and its estimate. 
It is immediate to see that Fp G [0,1], where it is desirable to 
make it as close to 1 as possible, in particular for those values 
of p for which pp « 1. 

Remark 1: We shall compare different subspace estimators 
for a given channel statistics, number of antennas and number 
of measurements (i.e., 7, snr, M, and m) according to the 
following procedure: 1) fix some e G (0,1); 2) find minimum 
p such that Pp > 1 — e; 3) compare subspace estimators in 
terms of Fp. This approach is quite different from the classical 
DoA estimation used in array processing (e.g., in radar). There, 
the relevant parameters to be estimated are the AoAs. In 
our problem, we do not really care about discrete angles, 
but only about a good approximation (in terms of captured 
signal power) of the span of the corresponding array response 
vectors. It follows that the problem of identifiability that 
typically arises in DoA estimation when the minimum angular 
spacing is too small, is irrelevant here. This is the reason 
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why we can handle continuous AoA scattering functions 7 , 
in contrast to some SR methods that assume discrete and 
sufficiently spaced AoAs. 0 

IV. Proposed Algorithms for Subspace Estimation 

In this section, we introduce the coprime sampling that 
we will use throughout the paper. We explain the proposed 
algorithms for estimating the signal subspace and provide 
further intuitions and discussions about their performance. 

A. Coprime Sampling Operator 

Let I? be a subset of [M] of size L and consider a ULA 
whose elements are located at id with i G T) (see Fig. [^. The 
array is called a minimum-redundancy linear arrays (MRLA) 
if for every £ G [M], with £ ^ 0 , there are unique elements 
i,i' GV such that £ = i—i'. This implies that M = 
or approximately L « \/2M. Now, consider an arbitrary 
configuration of sensors V C [M\ and let us define the 
difference set 

AI? = {i — i’ ■. i,i’ gT) with i > i'}. (13) 

It is clear that AI? C [M]. We call V a complete cover (CC) 
if AI? = [M]. This implies that for every £ G [M], there is 
at least a (not necessarily unique) pair i,i' GV such that £ = 
i — i'. By this definition, the location of sensors for a MRLA 
builds a CC with a minimum size. For large values of M, it 
is possible to build a CC of size 2sfM by coprime sampling 
iniiii. Let qi, q 2 be coprime numbers (i.e., gcd(gi, 92 ) = 1) 
that are very close to each other, and satisfy qiq 2 « M, such 
that 5i « 52 ~ s/M. Let V be the set of all nonnegative 
integer combinations of 51 and 52 less than or equal to M — 1 , 
i.e., V — Ui=i^ 2 {fc : k G [M], mod{k,qi) = 0}. Note that 
\V\ « 2s/M. We define the covering set of an element k G 
[M] by 

Xk = {{iji')'■ hi'^ V, i > i',i — i'= k}, (14) 

and its size by Cfc = \Xk\. For suitable selection of 51 and 52 
and for sufficiently large M, > 1 for almost all k G [M], 
the set AV is approximately equal to [M], and I? is a CC 
for [M]|^In the rest of the paper, we always assume that V 
is a CC for [M], Suppose the elements di of V are sorted 
in increasing order with di being the i-th largest element in 
the list. Also, we let to = \V\ and let B be the m x M 
binary matrix with elements [B]j = 1 for i G {!,... ,to} 
and zero otherwise. It is immediate to check that BB*^ = 1^. 
We will use B as the projection matrix to produces the low- 
dim observations X = BY. In passing, this has the advantage 
that the projection reduces to array subsampling, or “antenna 
selection”, which is very easy to implement in the analog RL 
domain by simple switches connecting the selected antennas 
to the RF demodulation chains and A/D converters. 

The first most basic property that a projection matrix B 
must satisfy in order to allow for efficient subspace estimation 

'^For small values of M, the set D might not be a CC, however, as we 
will explain the performance of our proposed algorithms will not change 
dramatically as far as the number of uncovered elements in [M] is negligible 
compared with M. 


is identifiability, that is, the associated matrix map must be a 
bijection when restricted to the class of signal covariance ma¬ 
trices generated by the model at hand. For coprime sampling, 
this is ensured by the following result. 

Proposition 1: Let S be an M x M Hermitian Toeplitz 
matrix and let B be the coprime sampling matrix. Then the 
mapping S —> BSB*^ is a bijection. □ 

Proof: Since S is Toeplitz, for any i,j G [M] with i > j, 
we have [ 8 ]^^ = [f]i-j, for some M-dim vector f. Also, as 8 
is Hermitian, f fully specifies 8 . Let i,i' G {!,..■ ,to} with 
i>i'. We can check that 


[B8B'^],,,, = [8]d.,d,, = 


(15) 


As I? is a complete cover for [M], for any k G [M] there 
are di,di> G V such that di — di/ = k, which using •EH) 
implies that [BLB'^Ji = [f]fc. Thus, all the elements of 8 
can be recovered from the low-dim matrix B 8 B'^ and vice- 
versa, thus, the mapping is a bijection. ■ 

Remark 2: Although in this paper, for simplicity of im¬ 
plementation, we focus on a coprime sampling matrix B, 
all the proposed algorithms, except the super-resolution (SR) 


algorithm in Section IV-D can be applied to other sampling 
matrices (e.g., i.i.d. Gaussian matrices). () 


B. Algorithm 1: Approximate Maximum Likelihood (AML) 
Estimator 


For the signal model (j^, we can immediately prove the 
following result. 

Proposition 2: Let be the sample covariance 

of the observations X. Then is a sufficient statistics for 
estimating the signal covariance matrix 8 . □ 

Proof: Recall that = BCyB'^ = B 8 B'^ + 
where we have explicitly used the fact that BB'^ = 1^. As 
the observations X are Gaussian, after some simple algebra 
the likelihood function is given by 


p(X| 8 ) 


exp { - TTr(c,(B 8 BH + } 

det(B 8 BH -p 


(16) 


It follows that the likelihood function depends on X only via 
C^. From the Fischer-Neyman factorization theorem ||45| , it 
follows that Cx is a sufficient statistics. ■ 

We always assume that the noise variance can be 
estimated during the system’s operation. In this section, for 
simplicity of the notation, we suppose that the input signal 
is scaled by and denote the resulting sample covariance 
by Cx, where due to normalization Cj = Cxl<j'^. Then, 
the Maximum-Likelihood (ML) estimator for the normalized 
subsampled data can be written as 8 * = argming^.^,^ ^(S), 
where 8 = 8 /(t^, and where L( 8 ) is the minus log-likelihood 
function given by 


L( 8 ) = logdet(I™ -f B 8 B^^) + Tr(c£(I™ + . 


By direct inspection, we have the following result. 

Proposition 3: L{S) is the sum of a concave function 
icav(8) = logdet(Im -f BBB'^) and a convex function 

□ 


Lvex(8) = Tr(Cj(I^+B8BH)- 
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As L{S) is not convex, local optimization techniques such 
as gradient descent are not guaranteed to converge to the 
globally optimal solution. Since S scales with SNR, it is 
possible to obtain a convex (indeed, linear) approximation of 
the concave function Lcav(S), which is tight especially for low 
SNR. More precisely, we have the following result. 

Proposition 4: Lcav(S) < Tr(BSB'^) for all S G T_|_. 
Moreover, for the low-SNR regime (snr 1), we have 
L, 3 ,(S) = Tr(BSBH) + o(snr). □ 

Proof: See Appendix C-A| ^ ■ 

Proposition 1^ states that for lo\\^SNR, Tr(BSB^) is the 
best linear approximation for Lcav(S), which implies that 

Lapp(S) = Tr(BSBH) + (17) 


is the best convex upper bound for L{S). We define the 
approximate maximum likelihood (AML) estimator for the 
normalized input by S* = argmingg.jj,^ L{S). 

Remark 3: It is interesting to note that this approximation 
is valid independent of the length of the training period T as 
far as snr is sufficiently small. Although the total signal-to- 
noise ratio of the estimation problem increases by increasing 
T, the validity of this approximation only depends on the 
SNR of an individual sample rather than the accumulative 
signal-to-noise ratio of the whole training samples. On the 
other hand, increasing T yields Cx = IE[x(f)x(f)'^] by 

consistency of the sample covariance estimator, and improves 
the estimation. 0 

The next proposition shows that the AML estimation can 
be cast as an SDR 

Proposition 5: Let Lapp(S) = Tr(BSB'^) + Tr(C 2 (Im + 
BSB^)-!) and let Cx = UAU^^ be the SVD of C^. Then 
the AML estimate can be obtained from the following SDP 


(S*,W*) 


= argmin Tr(BMB^) + Tr(W) 
mgt+,w 


subject to 


U + BMB'^ A 
W 


hO, (18) 


where A = = UA^/^. 


□ 


Proof: See Appendix C-B 


Some remarks about optimization problem ( [TSl l are in order. 

Remark 4: Although the optimization algorithm ( [TS] ) gives 
an approximation of the ML estimate for low-SNR regime, it 
does not need the explicit knowledge of SNR. However, an 
estimate of noise level in the array is necessary to scale the 
input data. By Proposition we expect that the performance 
of AML be very close to the performance of the ML in the 
low-SNR regime. <} 

Remark 5: After descaling all the parameters by tr, the 
SDP in ( [TSl l can be equivalently written as 

- Tr(W) 


(S*,W*) = arg min Tr(BMB'^) 4 
mgt+,w 


subject to 


.H 


A 

W 


0 , 


(19) 


resemblance to the atomic norm computation introduced in 
(H for the MMV problem. However, there are also some 
interesting differences. For example, in ( |T^ , the noise variance 
and the sampling operator B directly appear in the SDP 
constraint, whereas in MMV formulation they appear as 
an additional regularization term, i.e., ||X —BT|| < e', where 
e' can be set by knowing the value of cr. Moreover, the whole 
observation X during the training period has been replaced by 
A = in ([T9), thus, its complexity is independent of the 
training length T. (} 

Improving AML via Concave-Convex Procedure. As the cost 
function L{S) = Lcav(S) 4- Lvex(S) is a sum of a convex and 
a concave function, by slightly modifying algorithm ( [T8] l, we 
can obtain better estimates of the signal covariance matrix S 
even for high-SNR regime via the concave-convex procedure 
(CCCP) ll46l . This consists in running a sequence of convex 
programs iteratively, such that in each iteration a better esti¬ 
mate of the optimal (not necessarily the globally optimal) ML 
solution is computed. Let S^, f = 0,1,..., denote the estimate 
generated at iteration 1. Consider the iteration k, where the 
estimates Si, S 2 ,..., Sj, have already been computed. Given 
the last estimate S^, let Ffc := 1^ + BS^B*^. Then, we can 
approximate the concave function Lcav(S) as 


Lcav(S) = logdet(I,„4-BSB^) 

= logdet(rfe-4B(S-Sfe)B'^) 

= logdetjTfc) -4 logdet(l^ -4 r-'/"B(S - 

< logdet(I -4 BSfcB^) -4 Tr^T'^^^BjS - 
= Lcav(Sfe)-4{B'^rfc'B,S-Sfc), (20) 


where in (a), we used an extension of Proposition Improved in 
Appendix C-A to upper bound log det(Im-4H) for a Hermitian 
PSD matrix H by tr(H). We also define 

Tfc(S; Sfc) := Lcav(Sfc) + (B^r^^B, S - S^), 


and Lk(S;Sk) := Tk(S;Sk) + Lvex(S). It follows that, for 
arbitrary S and St in T_|_, we have L(S) < Lfc(S;Sfc). Also, 
from Proposition W we know that Tfc(S;St) is a tight convex 
upper bound for the concave function Lcav(S) especially 
around S^. Thus, Lfc(S;St) is a tight convex upper bound 
for L(S). To find the next estimate St+i in CCCP, we solve 
the following convex optimization 


Sfc+i = argminLt(M;St). (21) 

mgt+ 

Using Proposition]^ this can also be cast as an SDP Aat can be 
efficiently solved. We initialize the estimates with So = 0 for 
fc = 0. If L immediately seen that To(S; So) = Tr(BSB'^), 
and Lo(S, Sq) = Lapp(S) coincides with the AML function in 
Thus, the estimate Si corresponds to the AML e^imate. 
We can also see that the sequence of estimates {S^j^g 
monotonically improve the likelihood function, i.e.. 


where A = tjA = Cf , where Cx is the sample covariance 
of the input data without any normalization. This can be used 
to directly estimate S. The optimization (|T^ shows a close 


-^(Sfc+i) < Lfe(Sfc_|_i; Sfc) — 


min Lfe(M;Sfe) 

MGT+ 


< Lfe(Sfc; Sfc) = L(Sfc), 


( 22 ) 

(23) 
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where we used the identity Tfe(Sfe;Sfc) = Lcav(Sfc), which 
implies Lk{Sk',Sk) = L{Sk)- As a result, if AML is a 
good approximation of ML, we expect that Si provides a 
good initialization point for the CCCP, such that the sequence 
converges to the ML estimate (globally optimal 

point). 


C. Algorithm 2: MMV with Reduced Dimension (RMMV) 

One of the main problems with grid-based and off-grid 
MMV optimizations in (0 and ([^ is that their complexity 
scales very fast with the sample size T. Here, we propose an 
SVD-based technique as in ll26ll to reduce the computational 
complexity of Q and Consider again the observation 
X = BY during the training period of length T. For the 
time being, assume that the discrete AoA model holds and the 
arrival angles belong to a prefixed grid with elements in the 
interval [— 6 *inax) ^^max]- In this case, the model X = DW-f N 
with the discretized dictionary D = [Ba(0iBa(0G)] 
holds exactly. 

We assume that T ^ m = 0{\/M) and consider “economy 
form” SVD X = where Ym is a T x m 

tall unitary matrix and is the m x m diagonal matrix 
of the non-zero singular values. We define the new data 
X = XVm = UmSm- Notice that X can be simply computed 
from the sample covariance matrix of the data = ^XX*^, 
thus, it is not necessary to store the whole observation X 
during the training time. Moreover, can also be computed 
from X, thus. Proposition implies that X is also a sufficient 
statistics. We also have 

X = DWV^ + NV„ = DW -P N, (24) 


where W and N of dimension Gxm and mxm respectively, 
are the modified channel gains and array noise. It is not 
difficult to check that the reduced observation in ([24li is still in 
the MMV format, in the sense that the matrix W has nonzero 
rows only on the grid points corresponding to the channel 
AoAs. However, the dimension of the problem now is fixed 
and does not scale with T. Of course, since in reality the AoAs 
are not exactly placed on a grid, this method suffers from the 
already mentioned mismatch due to domain discretization. 

Our second algorithm for subspace estimation, referred to 
in the following as Reduced MMV (RMMV), simply applies 
the off-grid atomic- norm minimization for the MMV problem 
reviewed in Section III-DI to the low-dim data X. This can be 
cast as the following SDP 


(S*,W*,Z*) 


subject to 


arg min 

MGT+,WGC™x™,ZeC"x™ 


M Z 

Z^ w 


^0, ||X- 


Tr(M) -f Tr(W) 
BZ|| < e, (25) 


where e is an estimate of the norm of N. For large values of T, 
we expect that « cr^Im + BSB'^ by the consistency of the 
sample covariance estimator, such that the noise components 
in N remain approximately independent and Gaussian. If m 
is sufficiently large, then the optimal value of e concentrates 
around e* = crs/rn? = ma « 2(7s/M, where is the noise 
variance in each array element, and where we used the fact 


that, for coprime sampling, m « 2s/M. The noise level 
at the output of the array elements can be typically estimated 
during the system’s operation. 


D. Algorithm 3.' Super Resolution (SR) 

Let S( 7 ) = 'y{u)si{u)a.(u)^du be the signal covariance 
matrix as in ([igi. It is not difficult to check that, the first 
column of S contains the vector of Fourier coefficients of 
7 given by f := ( 7 , a) := f_^j(u)a(u)du, where = 
[( 7 ,a)]fc := f^^j(u)e-^^^^du, k € [M]. Since 8 ( 7 ) is 
Toeplitz, Proposition yields that for the coprime sampling 
matrix B introduced in Section [TV-AI all the elements of 8 , and 
as a result f, can be identified from B 8 B'^. This implies that 
for a sufficiently large T, we can estimate f accurately using 
the elements of the sample covariance matrix = BCj,B*^. 
Let Xk and Ck — \Xk\ be the covering set and the covering 
number of the element k, as defined in d- We define the 
following estimator for [f]fc 


[fk = 




(26) 


In Appendix in Proposition B-A| we prove that for k ^ 
0 , [f]fc is an unbiased estimator of [f]fc with an approximate 
variance (exact if Ck = 1 ) given by — tcI° converging to 
0 as T — 00 . We propose the following TV-minimization to 
recover the subspace of the signal from the estimates f 


7 * = argmin ||/||tv subject to |l(/,a) - f|| < e, (27) 

where e is an estimate of the norm of the noise in the data. 
For a non-negative measure 7 , the total-variation norm ||7||tv 
is simply given by 'y(u)du = [f]o. Moreover, as f can be 
obtained from the first column of the Toeplitz matrix 8 ( 7 ), 
we can write the optimization ( |27l i directly in terms of the 
covariance matrix: 

8 * =argminTr(M) 
mgt+ 


subject to ||Mei — f|| < 


[M]n), (28) 


where ei = (1, 0,..., 0)^ has dimension M x 1, where [MJn 
is the diagonal element of the Toeplitz matrix M (equivalent 
to [f]o), and where the e parameter in ( |27] l has been replaced 
by an estimate thereof in which ^ « 1 is some parameter that 
can be tuned appropriately. The motivation for ( |28] l is that for 
sufficiently large M and for the true signal distribution 7 , the 
best value of e in ( |27l i can be estimated by 

'|(7,a)-ff ='^|[f]fc-[f]fcp 


k 

= ^Var 

k 


[f]A 


E 

k 

(“) (cr^ 


E 


|[f]fe - [f]fc 

[f]o)^ 


T 


(29) 


where in (a) we used the resul ts pro ved for the variance of 


B-A 


and the fact for those 


the estimate [f]fc in Appendix 
elements with Ck > 1 , the resulting variance is less than 
^ . Thus, we have replaced e in (|27) by its approxima¬ 


tion 


+ [{]o) = \ '^(cr^-|-[M]ii), where an additional 





















tuning by the scaling parameter ^ has been added to include 
the variation of this optimal value around its mean. Algorithm 
( |28| l is a convex optimization that can be solved if an estimate 
of the noise variance cr^ is available. In particular, no prior 
knowledge of SNR is necessary. 

Remark 6: It might happen, especially for small array size 
M, that some of the elements k G \M] are not covered by the 
coprime sampling T), i.e., = \Xk \ = 0. In this case, [f]fc can 

not be estimated for those elements. However, we can still run 
( |27| l or equivalently ( |28l l by including in the constraint only 
the Fourier coefficients corresponding to the elements with 
Cfc > 1. Note that since the optimization is done over T_|_, if 
the number of unobserved elements of f (i.e., those elements 
with Cfe = 0) is negligible compared with M, they do not 
affect the performance considerably. 0 

Remark 7: A coprime sampling scheme similar to ours 
along with TV-minimization has been used in ll^ for DoA 
estimation. Provided the AoAs are well-separated, the esti¬ 
mation algorithm in ll^ can estimate them from the dual 
optimization proposed in EolEIl. However, the authors do 
not use the positivity of the measure (in our case 7 ) that 
naturally arises because of positive semi-definite property of 
the signal covariance matrix. Moreover, in this paper, we deal 
with a wireless scattering channel for which the AoAs may 
be clustered. This implies that the separation requirement for 
the super-resolution setup may not be met. However, since 
our aim is to estimate the subspace of the signal rather than 
AoAs, using the positivity of the underlying measure, we can 
directly solve the primal problem ( [27l i rather than the dual 
one that is used for DoA estimation. In particular, we do not 
need to go through the complicated and error-prone procedure 
of estimating the support (AoAs) via the dual polynomial, as 
required by DoA estimation in ll^ . 0 


E. Algorithm 4.- Covariance Matrix Projection (CMP) 

We dehne the sampling operator sub : —>• 

where sub(K) = BKE*^. Using the operator sub, we can 
dehne a positive semi-dehnite bilinear form on the space of 
M X M matrices by (K, L)b = (sub(K), sub(L)), where the 
latter inner product is dehned in the space of mxm matrices. 
This induces the seminorm ||K||b = \/ (K, K)b|^ It can be 
easily checked that (K,L)b restricted to T+ is indeed an 
inner product, and thus it induces a well-dehned norm. We 
also dehne 


aB{M) = max 


IKII 


kgt IIKIIb 


(30) 


where dependence on M indicates that the maximization is 
performed over the space of all M x M Hermitian Toeplitz 
matrices. The parameter a^iM) is a measure of coherence of 
the sampling matrix B with respect to the space of Toeplitz 
matrices. It is not difficult to check that for the coprime matrix 
B, it holds that 1 < asiM) < \fM. Our analysis shows that 
the CMP algorithm, dehned in the following, performs better 
for sampling matrices B with a smaller 


^Note that ||.||b is not a norm on the space of all M X M matrices since 
we can simply find an M x M matrix K 7 ^ 0 for which ||K||b = 0. 


Let Cx = In order to recover the dominant p-dim 

subspace of the signal, we hrst hnd an estimate of the whole 
signal covariance matrix Cy by 

C: = argmin|lC,-BMB^^||. (31) 

MeT+ 

This is equivalent to the following optimization problem 

C* = argminllCy - M||b, (32) 

M.GT+ 

which implies that the optimal solution C* is given by the 
projection of the sample covariance matrix of the whole array 
signal on T_|_ under seminorm ||.||b- Note that this projection 
is unique since the restriction of the seminorm ||.||b to T+ is 
indeed a norm and the projection theorem holds. 

If an estimate of the noise variance is available, we can 
directly estimate the covariance matrix of the signal via the 
following variant of ( [3^ 

S* = argmin||C^-cr^I^-BMB^II. (33) 

mgt+ 

Once C* or S* are estimated, we use its p-dim dominant 
subspace (for some appropriately chosen p S {1,..., M}) as 
an estimate of the signal subspace. 


V. Performance Analysis and Discussion 


In this section, we provide lower bounds on the performance 
of SR and CMP algorithms. We did not make progress in 
the analysis of AML and RMMV due to the difficulty of 
characterizing the SDP solution. Therefore, this is left as a 
future work. 

For the CMP Algorithm we obtain the following perfor¬ 
mance bound. 

Theorem 1: Consider the signal model given by 0 over a 
training period of length T. Then, for a given pS 
the CMP estimating a p-dim signal subspace achieves perfor¬ 
mance measure Fp (see ( [T^ ) satisfying 

E[rp]>max|l-^^^(1-1-), o|, (34) 

^ i/pvT snr J 

Var[rp]<;^(l + —)^ (35) 

Ir]^ snr 

where rjp is defined in ( [TT| l, and where snr denotes the SNR 
in one snapshot t G [T], □ 

Proof: See Appendix |B-B| ■ 

A result similar to Theorem [1] holds for the SR Algorithm. 

Theorem 2: Consider the signal model Q with the power 
distribution 7 ( 14 ) with an M-dim Fourier coefficients f = 
'j(u)a(u)du. Suppose that ^ in ( [28] l is sufficiently large 
such that f is feasible. Then, for a given p G {1,..., M}, the 
SR estimating a p-dim signal subspace achieves performance 
measure Fp satisfying 


4:j2pf, 1 , 

Tp > 1- 1 + — , 

PpVr snr' 


where snr denotes the SNR in one snapshot t G [T]. 
Proof: See Appendix |B-B| 


(36) 

□ 


Some remarks about Theorem [T] and |2] are in order here. 
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Remark 8: It is seen from Theorem that for T ^ oo, 
E[rp] converges to 1 and Var[rp] tends to 0 , which shows the 
consistency of CMP. Similarly, it is not difficult to check that 
for large T, by taking ^ « 1, the conditions of Theorem hold 
with very high probability. This implies that limT-j-oo Tp = 1 
in probability, yielding the consistency of SR. Thus, for large 
T, the p-dim subspace estimate obtained from both algorithms 
is as efficient as the best p-dim signal subspace. 0 

Remark 9: Both Theorem [T] and |2] indicate that even for 
infinite SNR, one still needs to take some measurements. The 
reason is that, even in the absence of noise, the signal y{t) 
in is stochastic and both estimators need some data to 
discover the underlying signal covariance structure. It is also 
seen that the performance is not very sensitive to SNR when 
the latter is not too small. However, for snr I 0, the required 
training time for achieving a specific target performance 
scales as T = 0{-^). This may indicate that the subspace 
estimation is quite challenging in low-SNR channels such as 
mm-wave. 0 

Remark 10: Let us assume that the signal power is con¬ 
centrated in an a-dim subspace. In this case, rja « 1, and for 
a moderate value of SNR, the required training time scales 
like T = 0{a). Two different models can be considered for 
the signal. In the first model, the effective dimension a does 
not scale by increasing M, thus, the required training length 
is independent of the embedding dimension or the number of 
array elements M. In the second one, the user has a hxed 
angular range A6 = fin, for some /3 S (0,1), for which 
a « /3M scales linearly with M. In this case, the required 
training length scales linearly in M. (} 

VI. Simulation Results 

In this section, we assess the performance of our proposed 
estimators via numerical simulations. The computationally ef- 
hcient implementation of the proposed optimization problems 
is beyond the scope of the present work. Therefore, here we 
use the general-purpose CVX package BtI for running all the 
convex optimizations. 

A. Comparing the Performance of Subspace Estimators 

We consider an array of size M = 80 and 0„iax = 60 
degrees (corresponding to an angular sector of 120 degrees). 
We use a coprime sampling with qi = 7,q2 = 9, where we 
denote the set of indices of the sampled antennas with T>, 
where \T>\ — 19. Although there are still some array indices in 
[M] = {0,..., 79} not covered by AT>, the simulations show 
that the estimators are quite insensitive to the presence of a 
few unobserved elements. We also assume that only m = 20 
RF chains are available at the BS, which would be enough to 
implement the coprime sampling, and to serve up to 20 data 
streams in the UL or DL. 

As an example, we consider a scattering channel with AoAs 
in the range 0 = U [02,0'f\, where 9i = —50, 0} = 

—40, 02 = 10, and 02 = 20 degrees. We assume a uniform 
power distribution over 0, thus, the total angular support is 
20 degrees. The AoA scattering function "f{u) is given by 

} 0 otherwise. 


where Ui = sin( 0 i) and u' = sin( 0 '), for * = 1 , 2 , and where 
K > 0 is a normalization constant. We calculate the vector 
of Fourier coefficients f = f_^j(u)a(u)du, from which we 
construct the Toeplitz signal covariance matrix S. The i.i.d. 
channel vectors in each training period are generated according 
to h = and the corresponding observation at the array 

antennas is y = h -f crn 2 , where ni,n 2 ~ CA7(0,Im) are 
independent vectors, and where cr^ denotes the noise variance. 

We compare the performance of each subspace estimator 
with the optimal beamformer that captures more than 95% 
of signal’s power (i.e., pp = 0.95), from which we obtain 
the required dimension p. We estimate the efficiency of 
each estimator, denoted by Fp, via numerical simulations. To 
compare the performance of our algorithms with the state of 
the art, we have selected three candidate algorithms that we 
have also reviewed in Section |II] 

• PETRELS. We have implemented the algorithm intro¬ 
duced in with the difference that here we fix the 
data size. Hence, in every step of the algorithm we select 
a training sample at random from the fixed training set 
and update the estimated signal subspace. 

• Nuclear-norm minimization. We run the optimization 
problem given in (|^. Here we optimistically assume that 
the algorithm knows the best value of e, given by e* = 
II Ca; — BCj,B*^||, although in reality this would not be 
available and must be estimated. 

• MMV Algorithms. We compare our algorithms with the 
state of the art MMV methods as summarized in Section 
|II-D| The first algorithm runs the optimization introduced 
in ( 0 , where we consider a quantized grid of size 
G = 3M equally spaced AoAs. We call the algorithm 
grid-based MMV (GBMMV). The second one is based 
on the off-grid techniques given by the optimization (|^. 
A byproduct of this optimization is to directly obtain 
an estimate of the covariance of the data C* = M*, 
given by the matrix M* that achieves the minimization in 
(§. Then, we extract the dominant p-dim subspace from 
such estimate. We call this algorithm grid-less MMV 
(GLMMV). We set e' to its optimal value given by £ 2 - 
norm of the noise in subsampled observations X. 

Performance vs. signal-to-noise ratio. Fig. compares the 
performance of our proposed algorithms with the ones in the 
literature for a range of SNR. The curve corresponding to each 
algorithm is obtained by averaging over 20 independent runs. 
It is seen that AML, RMMV, and SR perform comparably 
with the GLMMV but they have much lower computational 
complexity, which in particular does not scale with T. The 
performance of CMP is as good as GBMMV and better than 
Nuclear-norm minimization especially for higher SNR, but its 
complexity is much lower than GBMMV especially for large 
T. PETRELS does not perform very well for the hxed data 
size, e.g., its performance even for T = 800 is worse than the 
other algorithms. 

Performance vs. training length T. Pig. [^compares the scal¬ 
ing performance of our proposed algorithms and Nuclear-norm 
minimization for different training lengths. As the performance 
of AML and RMMV is comparable with the GLMMV and 
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Performance of different estimators vs. SNR for T = 100 



Fig. 2; Performance comparison of various subspace estima¬ 
tors versus the received SNR for training length T — 100. 
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Fig. 3: Scaling of the performance of different estimators with 
training length T € {200,400,800,1600} 


better than GBMMV and since, for large training length T, 
these algorithms run very slowly, we have not included them in 
this figure. The results generally show that for moderate range 
of SNR, the performance of the proposed algorithms improves 
considerably by increasing T. However, for low SNR, the 
resulting improvement is quite negligible. This confirms the 
comment made in Remark about the scaling behaviour of 
training length T with snr. 


B. JSDM with Subspace Estimation 

Basic Setup. As explained in the Introduction, our main 
motivation for estimating the users’ signal subspaces comes 
from JSDM ||7}0, where the users are grouped/clustered based 
on the similarity of their signal subspaces so that they can 
be served efficiently by the BS. In this section, we demon¬ 
strate the performance of our proposed subspace estimation 


algorithms included as a component of a JSDM system. In 
particular, we consider a setup closely inspired by la, where 
users have a uniform AoA scattering function over an interval 
located in the angular sector [—^max, ^^max] covered by the 
BS. Users are grouped by a Grassmannian quantizer with a 
fixed and pre-defined set of quantization points. Following the 
approach in IS], we fix the Grassmannian quantizer such that 
each quantization point is a subspace spanned by a group of 
adjacent columns of the M x M Discrete Fourier Transform 
(DFT) matrix. Once the users are partitioned into groups, a 
fixed number of users per group is served by JSDM with per- 
group processing (PGP) (see details in 171). The main motiva¬ 
tion of this paper is the HDA low-complexity implementation 
of JSDM, for which the number of RF chains (and A/D 
converters) at the BS side is limited to m ^ M. Therefore, 
both the estimation of the users’ subspaces and the estimation 
of the per-group effective channels, including the JSDM pre¬ 
beamforming, must be obtained by sampling no more than 
m analog signal dimensions. In our example, we considere a 
ULA with M = 80 elements and 0niax = 60 degrees, and the 
same coprime sampling as in Section |VI-A[ and we assume 
that the BS can only sample m = 20 analog RF demodulated 
signals. While in il it is assumed that the users’ channel 
covariance matrices are ideally known, and the grouping by 
Grassmannian quantization is performed on the subspaces 
extracted from the SVD of such covariance matrices, here 
we estimate the users’ subspace using our algorithms, from a 
block of T i.i.d. noisy channel snapshots captured during the 
training period, as explained before. Then, we apply the same 
grouping scheme and DFT pre-beamforming scheme to both 
the estimated case and the ideally known case, and compare 
the results in terms of achieved total sum-rate, averaged over a 
sufficiently large number of independent channel realizations. 
Signal Model. We considered a collection of users 1C = 
{!,...,AT} of size K = \IC\ = 200. The AoA scattering 
function of user fc is a uniform distribution in the interval 
[9k — ^§^,6k + corresponding to the following power 

distribution in the u-domain (recall that u = , ): 

f X „ p r sin(e;,-^) sin(efc+^|M] 

'yk{u) = < ’ sin(6l„ax) J’ 

[ 0 otherwise. 


where x > 0 is a normalization constant. The users’ angular 
spread (same for all users) is A9k = A9 = 20 degress, and the 
center-AoA 9k is i.i.d. and uniformly distributed in [—0max + 
^max — ^]- Letting Auk indicate the support size of the 
power distribution ^k{u), we define Au = maxk^ic Auk as 
the maximum support size of the users in the w-domain, where 

A ^ 2sin(Ae/2) _ Q ^ 
sin(6»„,ix) 

User Grouping. The Grassmanian quantizer is obtained by 
following il. We divide the domain u G [—1,1] into inter¬ 
twined groups (/ = Gj, with G = 9, as illustrated in 

Fig. We denote the u-support of group g G G with Ug C 
[—1,1]. Since the length of Ug satisfies \Ug\ « Au = 0.4, 
due to the intertwined structure of the groups (see Fig. |^, we 
expect that 7 fe(u) for each user k is well-aligned with at least 
one Ug. The signal subspace for group g G G simply given 
by a submatrix Fg of the M x M DFT matrix as follows. We 
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Fig. 4; Grouping of the users in the u-domain. 

define the discrete = {—1, —\ + ^,..., 1 — ^}, and 

set the columns of Fg to the normalized array steering vectors 
for all u € n Ug. In our case, the matrix Fg 
has dimension M x qg with Qg = M/5 = 16. Therefore, the 
JSDM pre-beamforming matrices Bg (see notation in ||7l) are 
simply given by Bg = Fg 

We cluster the users into groups in Q, where each user 
k G JC is assigned to the group ik G G, where ik = 
argmaXggg (S^, FgFg) corresponds to the group whose sub¬ 
space captures the maximum amount of power in S^. This 
can be seen as an improved weighted version of the chordal 
distance between subspaces used in i), and reduces to H 
when the non-zero eigenvalues of Sk are constant. As said 
before, we apply the same clustering/quantization scheme both 
to the ideally known set {Sfc} and to the estimated set of user 
covariances {S^}. In this example, we used the SR Algorithm 
in ( |28| l with tuning parameter ^ — 2. Since the performance of 
our proposed algorithms in terms of Fp is similar (see Section 
Vl^ , we expect that also the other proposed algorithms yield 
similar results in terms of JSDM sum-rate as the ones reported 
here for SR. 

Beamforming and Scheduling. After clustering the users, 
JSDM with PGP is applied. Following ID, we divide the 
groups G into two subsets Gi = {1,3,5,7,9} and G 2 = 
{2,4, 6, 8} with intertwined angular supports (see Fig.|^, such 
that within each subset the groups have mutually orthogonal 
matrices Fg. The two group subsets are served in orthogonal 
resource blocks, while the groups within each subset are served 
using spatial multiplexing. 

For clarity, we focus on subset Gi while the scheme for G 2 
follows immediately. Let hg^ be the channel vector of the £-th 
scheduled user in group g G Gi, and let Hg = [hg^^, • • •, hgg ] 
and H = [Hi, H 3 , ... ] be the channel matrix of the scheduled 
users in group g, and the channel matrix of all groups 
combined in a given coherence block. The BS wishes to serve 
S — ^9 Streams (users), where Sg < qg is the 

number of users scheduled in group g. We suppose that the 
scheduled users in group g are selected randomly from the 
set of users assigned to group g by the grouping algorithm. 


T = 13 


T = 25 




T = 50 


T= too 




Fig. 5: Sum-rate vs. SNR performance of a JSDM with PGP 
with exact subspace knowledge (PGP), and comparison with 
a JSDM with PGP with subspace estimation (PGP-SE) for 
training lengths T G (13,25,50,100}. 


Since we have only m = 20 RF chains, we set Sg = 4, for 
g G Gi (Sg = 5 for p G G 2 )- The JSDM two-stage precoder 
is given by V = UR, where U = [Fi, F 3 , ..., FgJ is the 
M X q DFT pre-beamforming matrix with q = X^gGPi ^Ig’ 
where R G is the baseband (implemented in the digital 

domain) MIMO multiuser precoding matrix in block-diagonal 
form according to the PGP scheme. The MIMO precoding 
matrix R depends on the instantaneous realizations of the 
reduced dimensional projected channel matrices Hgg = F^Hg 
for g G Gi- The scheduled users in each group are served using 
linear zero-forcing beamforming (ZFBF) matrix obtained as 
the column-normalized version of the pseudo-inverse of the 
projected channel matrix JHgg. Further details can be found in 
Q and are omitted here due to space limitation. 

Sum-rate performance and simulation results. Let U and 

R be the pre-beamforming and the MIMO precoding ma¬ 
trices, and let d G C'^ be the S'-dim vector of S symbols 
corresponding to S data streams. We normalize the symbol 
energy and the users’ channel vectors such that E[dd*^] = I 5 
and E[||hfc|p] = 1, for all k G JC. When d is transmitted 
in the DL, the received signal at the user side is given by 
y = H'^URd -f n, where n ^ CAf{0, ^Is) is the S'-dim 
vector of additive noise. This is equivalent to an S' x S' MIMO 
multiuser interference channel given by the transfer matrix 
T := H'^UR. Treating the interference as noise, the signal- 
to-interference-plus-noise (SINR) for the data stream s is given 
by 


sinr^ 




1 ' 
snr 


(38) 
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The corresponding achieved sum-rate in the block is given by 

^ 1 

C"""’(H,snr) = ^ - log 2 (l + sinr^). (39) 

In simulations, we evaluate snr) for 1000 independent 

realizations of the channel matrix H, and plot the average sum- 
rate, which yields the “ergodic” rate achieved via coding over 
many fading blocks, and ideally adapting the rate of the data 
streams in each block. For simplicity, in our simulations we 
have used the same value snr both for DL data transmission 
and for the UL training, in order to estimate the users’ 
subspaces. In practice, UL and DL SNRs may differ. However, 
this does not change the qualitative conclusions of our results, 
since we can always compensate for a lower UL SNR by 
increasing the subspace estimation training length T. In order 
to clearly put in evidence the effect of the subspace estimation, 
we calculated the achievable sum-rate under the assumption 
that the projected channel matrices are perfectly known 
at the BS. In general, a dedicated UL training per group 
must be implemented, from which is estimated via UL- 
DL reciprocity 15] 16). We did not include here the projected 
channel estimation phase since it incurs a small performance 
degradation with respect to the ideal case in the practically 
relevant range of SNR (see analysis in Q), and would have 
the effect of “blurring” the dependence of the performance 
with respect to the subspace estimation, which is the focus of 
this paper. 

Fig. 0 shows the JSDM sum-rate vs. SNR for the system 
described before. It is seen that subspace estimation even for 
very short training length T does not incur any significant loss 
in terms of achievable rate. 

VII. Conclusion 

In this paper, we studied the estimation of the user signal 
subspace in a MIMO wireless scenario where a transmitter 
(user) with a single antenna sends training symbols, and a 
receiver (BS) equipped with an array with M antennas obtains 
noisy linear sketches of the corresponding channel vector, 
modeled as an i.i.d. (in time) and correlated (in the antenna 
domain) vector Gaussian random process. Our algorithms 
require sampling only 2'/M antenna array elements and 
return a p-dimensional estimated subspace capturing almost 
the same amount of signal power as the best possible p- 
dimensional subspace. We analyzed the performance of some 
of the proposed algorithms and compared it with that of state- 
of-the art methods in the literature via numerical simulations. 

We also illustrated in details how the users’ subspace 
information can be exploited in a JSDM hybrid digital-analog 
implementation of a massive MIMO system. We provided 
numerical simulations showing that in terms of the achieved 
ergodic sum-rate, such a JSDM system in which the users’ 
subspaces are estimated through our proposed algorithms 
performs almost indistinguishably from a scheme that as¬ 
sumes ideal knowledge of the users’ channel covariances. 
This indicates that the proposed algorithms are well suited 
for JSDM with HDA implementation with a small number 


m M of RF chains (and A/D converters), where the group- 
separating beamformer is implemented in the analog (RF) 
domain, and the multiuser precoding is implemented in the 
digital (baseband) domain. 

Appendix A 

Overview oe Complex Gaussian Random Variables 

In this appendix, we review some of the properties of the 
complex circularly symmetric Gaussian random variables that 
we need in this paper. 

Proposition 6: Let X = + jXi and Y = Yr + jYi be 

two zero-mean unit-variance circularly symmetric complex¬ 
valued Gaussian variables with a correlation coefficient p = 
Pr Y jPi- Then we have: 

1) e[a:x*] = E[yy*] = i, and e[a:x] = E[yy] = 

E[A:y] = 0. 

2) E[x2] = E[xf] = E[y,2] = E[y/] = i. 

3) E[A:^X,] = E[YrYi\ = 0, = E[X,Yi\ = 

and E[XrYi\ = -E[X,y^] = f. 

We also need the following proposition for real-valued Gaus¬ 
sian variables. 

Proposition 7 (Price’s Theorem (1451/ ).- Let Z and W be 
two real-valued A/'(0,1) Gaussian variables with a covariance 
p. Let g{z,w) be a differentiable function of (z,w), and 
/(p)=E[ 5 (Z,JU)].Then | J = E VL)]. □ 

Using Proposition]^ we can prove the following result. 

Proposition 8: Let X and Y be as in Proposition Then, 
we have 


1) E[XfY^^] = E[X2y7] ^ 

E[XjY?] = 

2) E[|A:r*|2] = E[|x|2]E[|yp] 


l+2p‘ 


and E[Xfr/] = 


|E[A:y*]|2. 

Proof: For simplicity, we prove only one of the iden¬ 
tities in part 1. Let us consider E[Ar^yj^]. Note that from 
the properties of the complex Gaussian variables, it results 
that {Z,W) = {y/^Xr-is/^Yi) are jointly Gaussian N(0,1) 
random variables with covariance pi. This implies that g{pi) = 
^[XfY^] = E[Z'^W"^\ is a function of pi. Applying the 
Price’s theorem in Proposition we have 


—g = AE\ZW] = Ap, 

dpi 


(40) 


which implies that g{pi) = 2pf + k, where k is a constant. 
For Pi = 0, the random variables {Z^W) are independent from 
each other, and ^(O) = E[Z^1U^] = E[Z^]E[IU^] = 1, which 
implies that k = 1. Hence, we have g{pi) = 2p^ + 1, which 
implies that E[Arj^J4^] = . 

To prove part 2, note that IXY*]"^ = (X^ + Xf){Y^ +Y^) 
can be expanded into four terms whose expected values can 
be computed using Price’s theorem, where we obtain 

E[|yF*p] = 2(i^^) + 2(i^^) = l + |pp. (41) 

Moreover, we have 

E[XY*] = E[XrYr + XfY,] + jE[XiYr - XrY,] 

= 2(y) + j2(-^) = - jp, = pL (42) 

The result follows from ( |4T] ), ( |42| ) and E[|Arp] = E[|yp] = 1. 
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Appendix B 

Analysis and Proof Techniques 
A. Statistics of the Subsampled Signal 

From the signal model in (|^, it is seen that the received 
signal y{t) is a complex Gaussian vector with covariance 
matrix Cy = 8 ( 7 ) + Since we assume that y{t) is 

independent across different snapshots t G [T], this completely 
specifies its statistics. Similarly, it results that the statistics 
of the sketches x(<) = By(t) is fully specified with the 
covariance matrix = BCyB*^. For the coprime sampling 
matrix B, and for i,jG m} with i > j, we obtain 

[Cx]i,j = [S(7)]di.dj +<j'^Sij 

= [(\d,-dj + := [g]d,-dr ( 43 ) 


where di G T) i s the z -th largest index of the sampled antennas 


as in Section 


IV-A 


and where [f]fc = 7 (u)e^^'^“dit 

denotes the fc-th Fourier coefficient of 7 . Notice that the SNR 


is simply given by snr = 

Now consider a specific k G [M] and let di,dj 


such that k = dj — d-i. Since, as in Section IV-A 


G V he 
we assume 
and exist. 


that I? is a complete cover for [M], such d.^ miu u,j 
Let us also define [gj^ = [Cx\i,j- The following proposition 
charactereizes the mean and the variance of [g]fc. The proof 
uses the properties of the complex Gaussian variables reviewed 
in Appendix [A| 

Proposition 9: Let k G [M] with k = di — dj, and let 


]fc = [Cx]i,j- Then, we have E 


= [g]fe,Var 


((T^-|-[f]o)^ _ (T^(l-l-snr)^ 

T T 

Proof: Taking the expectation, we have 


[gjfc 


□ 


E 


[gJfc 


= E 




— —d,- 4“ rr Sij — [g]fc- 


( 44 ) 


Since the observations y{t), and as a result x(f), are indepen¬ 
dent for f G [T], and 

T T 

i ^[x(t)]4x(i)]‘ = i (45) 


it results that Var 


= 4Var 


[y(i)]<i.[y(0]l]- Hence, 


using the properties of the complex Gaussian variables proved 
in Proposition we obtain 


Var[[y(t)]dJy(f)]3j 

= E[|[y(^)]dJy(i)]dj|^] - |E[[y(i)]d,[y(^)]S,]| 

= E[|[y(i)],.|^]E[|[y(t)]d,|^] +|E[[y(^)],,[y(^)]5J|' 

- |E[[y(i)]di[y(i)]dJ I = ([f]o-f = cr‘‘(l-hsnr)^ ( 46 ) 


where snr = 
snr) 2 . 


Ifk 

^2 


as before. Hence, Var 



‘(1 


B. Analysis of the Performance of the CMP and SR Algorithms 

In this section, we analyze the performance of the CMP and 
SR. First, we need the following preliminary results. 


Proposition 10: Let Cy be the sample covariance of the 
signal y{t), t G [T], and let C* be the CMP estimate 
given by iB or equivalently ( |32j i. Let C G T_|_ be an 
arbitrary Hermitian PSD Toeplitz matrix. Then max|||Cy — 

C;|iB,||C;-C'||B}<||C,-C1|B. ^ □ 

Proof: The inequality ||Cy — C*||b < IlCy — C'||b simply 
follows from the definition of C* as the projection of Cy 
onto the space T_|_. Thus, we only need to prove the other 
inequality ||C* — C'||b < IjCj, — C'||b- To prove this, note 
that the seminorm |j.||B is defined from a PSD bilinear form. 
As C itself belongs to T+, it is not difficult to see that at the 
projection C*, the vector C — C* is a feasible direction to 
move because from the convexity of the space T_|_, it results 
that C*-|-a(C' —Cp G T+ for any a G [0,1]. Thus, from the 
optimality of C*, it results that (Cy — C*, C' — C*}b < 0. 
This implies that 

l!c,-c'|i| = ||c,-c; + c;-c'|i| (4?) 

= lic,-c;ii| + ||c;-c'||| (48) 

-2(c,-c;,c'-c;)b (49) 

> lic,-c;i|| + ||c;-c'|i|, (so) 

from which the desired inequality |jC* —C'||b < ||Cj, —C'||b 
results. Combining the two inequalities, gives the proof. ■ 

Proposition 11: Let C* and Cy be as defined before. 

Suppose ||C* — Cj,||b < e and let V G M{M,p) be an 

arbitrary M xp matrix with V'^V = Ip. Then, |(Cp, VV*^) — 

{c;,VY»)\ < eV¥M- □ 

Proof: Using the Cauchy-Schwartz inequality, we have; 

l(Cp, VV«) - (c;, vv^^)| = |(Cp - c;, vv»)| 

<||Cp-C;||y/Tr(VVHVVH) 

= l|Cp-c;iivji 

< aB(M)||Cp-C;||BVP<e^^M, (51) 

where in (o) we used the coherence parameter of the coprime 
matrix asiM) < s/M. ■ 

After finding the projection C*, we use its p-dim dominant 
subspace to design a beamformer matrix for the received signal 
y{t), t G [T]. Let CJ = UAU'^ be the SVD of CJ, and let 

V G p) be the matrix consisting of the first p columns of 

U. If the estimate C* is very close to the Cy, then we expect 
that V, in terms of capturing the signal power, be a good 
approximation of the dominant p-dim subspace of the signal. 
This has been formalized in the following propositions. 

Proposition 12: Let C* and Cy be as defined before and 
let S be the signal covariance matrix, where we have Cy = 
S-Uct^Im. Assume that ||C;-Cp||B < e. Let Cy = UAU'^ 
and C* = UAU*^ be the SVD of Cy and C* respectively. Let 

V and Y he M xp matrices consisting of the first p columns 
of U and U. Then, |(S, VV^) - (S, VV^)! < 2eyfdM. □ 

Proof: First note that Cy and S differ by a multiple of 
identity matrix 1m- As (Im,VV'^) = (Im,VV'^) = p, we 
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can equivalently prove the following inequality | {Cy, VV*^) — 




< 2€y/pM. 


From the triangle inequality, |(Cj,,VV'^) — (Cj,,VV*^ 
can be upper bounded by 

|(C„ VV'^) - (C* VV^)| + |(C* VV») - (C„vv»)| 


(C„VVH) 


,VV'^)^Con- 

,VVH)| 


is 2ev^pM-optimal for S. 

We also need the following result. 

Proposition 13: Let C* and Cy be as before. Then 


E 


\c;-Cy 


< 


2Ma^{l 


■ snr 


which also implies E 


|c;-c,i 


Proof: First notice that, we have 


T 

^ \/2M(l+snr) 

<- ■ ^ 




(a) 


E 

llc;-Cpii^ 

< E 

IlCp-CpIll 


M-1 

E 

k=0 

M-1 


CfeE 


[gjfe - [gjfc 


= ^ CfeVar 


[gJfe 


k=0 

m{m + 1) a'^{l + snr)^ 


(b) cr^(l + snr)^ 

Y 


M-1 

E 

fc=0 


Ck 


2MY{1 


snr 


10 


identity |e 


ic;-c,ii 


< E 


ic;-c,ii| 


E[Tp] = 1 - E 


= 1 - E 


H\n 


■(S,VVH) - (S,VV 
. (S,VVH) . 
r|(S,VV^*) - (S,VVH 


(S,VVH) 


(j,)^ E[|(C„ VV»)- 


(C„VVH) 


(c) 

> 1 - 


p,Tr(S) 

2VPte[||c;-cj|b] 


VpM[f]o 


Using Proposition 11 the first term is less than e^/pM. For 
the second term, note that 0 < (Ct,,VV'^) < 


2Yy/^M{l + snr) 
- PpM[{]oVT 

1 


because from the SVD, V is the best p-dim beamformer for 
Cy. Similarly, we have 0 < (C;, VV'^) < (CJ, " " 
sequently, we can always make |(C*,VV'^) — {Cy, 
larger by either changing V into V or vice-versa. This implies 
that |(C*,VV^) — (Cy,VV^)| is always smaller than the 
maximum of |(C;, VV'^) - (C^, VV'^)|_and |(C;, VV'^) - 
(Cj,,VV*^)|, where from Proposition lljJ both terms are 
smaller than e\/pM. Combining with the first upper bound, 
we have 


Vp'^T 


), 


(60) 

( 61 ) 

(62) 

(63) 


where in (a) we used (S, VV'^) < (S, VV^), in (6) we used 
Cy = S + YIm and Tr(VV^^) = Tr(VVH), in (c) we used 


Proposition 12 and finally in (d) we used Proposition 13 
As Fp > 0, this implies that E[rp] > max|l — + 

^), o|. In a similar way, we obtain 


|(C;,VVH)-(Cp,VV»)| <2e\/pM, (52) 

which is the desired result. ■ 

Remark 11: Notice that V is the optimal p-dim beam- 
former for S (or equivalently Cy). Proposition [T^ implies that 
the optimal beamformer for the estimate covariance matrix C* 


Varpp] = Var 


Var 


KS.VV^) - (S,VV») 


(S,VVH) 

|(Cp,VVH)-(Cp,VV«) 


E 


< 


r?2Tr(S)2 

|(Cp,VVH)-(Cp,VV 


H\|2 


0 


(53) 


□ 


(54) 


(55) 


(56) 


(57) 


E 


< 


ry2Tr(S)2 

(2VPT||C;-CpHb 


2n 


p2Tr(S)2 

4pME[||C;-Cp|||] 


?72M2[f]2 
(a) 8pM^(snr + 

= Tp 2 M 2 [f ]2 


8p 


1 


^ 2 ( 1 +—)’ (64) 

snr 


where in (a) we used Proposition 13 and the fact that snr = 
This completes the proof. 

Proof of Theorem]^ Let f be the Fouri er coefficient of 7 and 
let f be its estimate as in Section IIV-DI Let S* be Hermitian 
Toeplitz matrix obtained from the optimization ( |28] l and let 
us denote its first column by f*. Note that Tr(S*) = M[f*]o. 
We suppose that the parameter ^ is tuned largely enough such 
that the true f is feasible. Also, from the optimality of f*, it 
results that [f*]o < [f]o- Thus, we have both inequalities 


2 T T 

where in (a) we used the inequality proved in Proposition 


\r-i\\<^i-{Y + 


by replacing C' = Cy, where [gj^ are as in ( |4^ , where 
Ck denotes the covering number of fc S [M] by the coprime 
sampling T> with m =\T>\ = 0{2 Ym), and where (6) results 
from Proposition The other result simply follows from the 


|f-f|i<W747(^^ 


T 

M, 

T 




[f]o). 


[f]o) (65) 


( 66 ) 


Proof of Theorem Using the definition of Fp and taking 
the expectation value we obtain 


(58) 


(59) 


Applying the triangle inequality, we have l|f - r|| < 

2^^J^Y{1 + snr), where as before snr = Using the 
Toeplitz property, we obtain that 

||S - S*|| < ^TMIIf - r|| < + snr). (67) 

vT 

Let V and V be M x p matrices whose columns span the 
dominant p-dim signal subspace of S and S* respectively. 
Using a result similar to Proposition 0 and[T^ we obtain 

|(S, VV^) - (S, W»)| < 2V^||S - S*|| (68) 
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-(1 + snr). (69) 

Dividing both side by (S, VV*^), using the definition of Fp in 
and the definition of rjp in ( [TT] i, and using the fact that 
Fp G [0,1], we have 


Fp > 1- 




(1 + 



(70) 


This completes the proof. 


Appendix C 

Proofs of the Propositions 
A. Proof of Proposition [^ 

Note that BSB*^ is a PSD matrix. We prove the following 
more general statement that for any m x m Hermitian matrix 
H for which 1^ + H is PSD, we have logdet(Im + H) = 
Tr(H) + o(Tr(H)). Let UAU'^ be the SVD of H. Then, 

logdet(I„ + H) = logdet(I„ + UAU'^) 

m 

= log det(I^ + = X! 

m 

= ^ A, + o(Tr(H)) = Tr(H) + o(Tr(H)). (71) 

i=i 

In particular, from the concavity of the Logarithm, it results 
that log(l + A^) < Xi- This implies that logdet(Im + H) < 
Tr(H). This complete the proof. 


B. Proof of Proposition [^ 

^ Let (S*,W*) be the output of the SDP ( [TSl l, md let 
Sopt be the AML estimate given by the optimization Sopt = 
argmingg.j,^ Tapp(S). Let us define Hopt + BSoptB'^ 

and Wopt := A'^H'ptA. Using the well-known Schur 
complement condition for positive semi-definiteness (see ii 


p.28), it results that 


Hopt A 
A^ Wopt 


^ 0, which implies 


that (Sopt, Wopt) satisfy the SDP constraint in ([T^. Hence, 


iapp(Sopt) = Tr(BSoptBH) + Tr(C2(I„ -h BSoptB^^)"!) 

Tr(BSoptBH) -f Tr(A^^H-plA) 

= Tr(BSoptBH)+Tr(Wopt) 

> Tr(BS*B”) -f Tr(W*) 

(b) ^ _ _ _ 

> Tr(BS*B'^) -f- Tr(A'^(I™ -t- BS*B'^)-1A) 

= iapp(S*), (72) 


where in (a), we use = AA'^, and where in (6), we apply 
Schur complement condition to the SDP constraint to obtain 
W* ^ A^(Im + BS*B^)“^A, and take the trace of both 
sides to obtain the desired inequality. As Sopt is the AML 
estimate, we also have Lapp(Sopt) < Lapp(S*), which together 
with ^T2\ completes the proof. 
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